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Abstract 

Digital contents in large-scale distributed storage systems may have different reliability and access 
delay requirements, and erasure codes with different strengths can provide the best storage efficiency 
in these systems. At the same time, in such large-scale distributed storage systems, nodes fail on 
a regular basis, and the contents stored on them need to be regenerated from the data downloaded 
from the remaining nodes. The efficiency of this repair process is an important factor that affects the 
overall quality of service. In this work, we formulate the problem of multilevel diversity coding with 
regeneration to address these considerations, for which the storage vs. repair-bandwidth tradeoff is 
investigated. We show that the extreme point on the optimal tradeoff curve that corresponds to the 
minimum possible storage can be achieved by a simple coding scheme, in which contents with different 
reliability requirements are encoded separately with individual regenerating codes without any mixing. 
On the other hand, we establish the complete storage-repair-bandwidth tradeoff for the case of four 
storage nodes, which reveals that codes mixing different contents can, in general, strictly improve the 
optimal tradeoff over the separate-coding solution. 

Keywords: Data storage, multilevel diversity coding, regenerating codes. 


1 Introduction 

The importance of big-data analytics has been widely recognized in recent years. However, efficient 
large-scale distributed data storage systems have to be designed and implemented in order to support 
the complete pipeline of data collection, processing and archival on a scale that has never been put into 
practice before. Advanced coding techniques have been shown to be helpful in terms of providing both 
performance improvement and cost reduction in such systems. 

Digital contents in large-scale distributed storage systems usually have different reliability require¬ 
ments. For example, although it is important to protect recent customer billing records with a very 
reliable code, it may be acceptable to allow the data loss probability of a five-year-old office document 
backup to be higher by using a weaker code. Moreover, erasure codes can also be used to reduce data 
access queuing delays; see jl -[3] and references therein. Thus, different levels of latency can also be inte¬ 
grated into the same data storage system by adopting different coding parameters for different contents. 
Such ffexibility can significantly reduce the cost of hardware infrastructure, and there is a tremendous 
amount of interest recently in both industry and academia to design efficient software-defined storage 
(SDS) systems utilizing flexible erasure codes. The theoretical framework of symmetrical multilevel di¬ 
versity (MLD) coding |4,[5] is a natural fit for this scenario, where a total of independent messages 
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(Mi, M 2 ,Mfc 0 ) are to be stored in n > ko storage nodes situated in different network locations, each 
with a units of data. The messages are coded in such a way that by accessing any k < ko of these nodes, 
the first k messages (Mi, M 2 , Mf.) can be completely recovered. 

Disk or node failures occur regularly in a large-scale data storage system, and the overall quality 
of service is heavily affected by the efficiency of the repair process. Dimakis et al. [6] proposed the 
framework of regenerating codes to address the tradeoff between the storage and repair-bandwidth in 
(n, k ) erasure-code-based distributed storage systems. To repair a node, a new node replacing the failed 
one requests f3 units of data each from any of the d remaining nodes, and regenerates the a units of 
content to store on the new node; this code is referred to as an (n, k, d ) regenerating code. There 
exists a natural tradeoff between the storage a and the repair bandwidth f3: The point corresponding to 
the minimum amount of the storage is referred to as the minimum storage regenerating (MSR) point, 
and the other extreme corresponding to the minimum amount of repair bandwidth is referred to as the 
minimum repair-bandwidth regenerating (MBR) point. In |6j, the content regenerated is allowed to be 
only functionally equivalent to the original content stored on the failed node, thus the name “functional- 
repair” regenerating codes. In practice, requiring the content regenerated to be exactly the same as that 
stored on the failed node can simplify the system design significantly, and thus recent research effort has 
been focusing on “exact-repair” regenerating codes [7-13 


In the current regenerating code framework, only a single message is allowed, and thus only a single 
level of reliability and access latency is offered. On the other hand, in the classical MLD coding framework, 
the data repair process was not considered. In this work, we consider repair-efficient codes in systems 
with heterogeneous reliability and latency requirements, and investigate the optimal storage vs. repair- 
bandwidth tradeoff. Because of the connection to the MLD coding and regenerating code problems, 
we refer to this problem as multilevel diversity coding with regeneration (MLD-R) in the sequel (see 
Fig. a for an illustration of the system). We shall restrict our attention to the case of exact-repair and, 
furthermore, to the case when d = n— 1, because this is the most practically important case. Nevertheless, 
the proposed framework can be generalized to other relevant settings in straightforward fashion. 

An intuitive and straightforward coding strategy for MLD-R is to use an individual regenerating code 
for each message to satisfy the respective reliability and latency requirement (i.e., separate coding), and 
thus an important question that we wish to answer first is whether it is even beneficial to consider codes 
that “mix” the messages. Without the repair consideration, it was shown in [4|[5] that mixing is not 
necessary for the (symmetrical) MLD coding problem. As we shall see shortly, for the minimum storage 
point on the optimal tradeoff curve where a is minimized (analogous to the MSR point in standard 
regenerating codes), the aforementioned separate-coding strategy is again sufficient. On the other hand, 
we show for n = 4, by providing a novel code construction, that mixing can strictly improve upon the 
performance of the separate-coding solution in terms of the overall storage-repair-bandwidth tradeoff. In 
fact, we completely characterize the optimal tradeoff for this case by establishing its converse. It is worth 
noting that when n = 3, separate coding is sufficient, thus n = 4 is the smallest non-trivial case where 
the benefit of mixing manifests. 

The main difficulty for establishing the aforementioned results is in deriving the tight outer bounds. 
For the minimum storage point, we utilize a recursive bounding technique which may be of independent 
interest. The converse for the tradeoff rate region when n = 4 is rather difficult to identify and derive 
analytically, and our approach is to utilize the computational method developed in 13 . The proof is thus 


presented in tables whose rows are simple known information inequalities, and the summations of the 
rows give precisely the desired outer bounds. Though this does not conform to the conventional approach 
of using chains of information inequalities in information theory literature, we believe that these tables 
are, in fact, more fundamental: We can write down many different versions of chains of inequalities with 
the help of these tables, by taking different orders when applying these individual inequalities. 

The rest of the paper is organized as follows. A formal problem formulation and some preliminaries 
are given in Section [2| In Section [3j the main results of the paper are presented together with the relevant 
discussions. The proofs are given in Sections [4] and [5} Section [6] concludes the paper with a few possible 
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Figure 1: Multilevel diversity coding with regeneration. 


future research directions. Several technical proofs are given in the Appendix. 

2 Problem Formulation and Preliminaries 

2.1 Problem Formulation 

An MLD-R code is formally defined below, where I n denotes the set {1,2,... ,n} and |A| denotes the 
cardinality of a set A. Without loss of generality, we may assume that the number of nodes accessed 
during repair, i.e., the parameter d, is the same as ko, where ko is the number of messages. This is 
because if d < ko, then the message^] (M^+i, ...,Mfc 0 ) can be viewed as part of Md as they can 

all be reconstructed by accessing any d nodes. On the other hand, if d > ko, we can simply consider an 
alternative problem with k' o = d and define the messages Mfc 0 , Mk 0 +i ,..., Md to be degenerate (i.e., with 
rate zero). Recall that we shall assume d = n — 1 for the rest of the paper. 

Definition 1. An (N\, Ni ,..., Nd, IQ, K) MLD-R code consists of n encoding functions /f(-), )T)?;=i (") 
decoding functions /f (•, ■), n(n — 1) repair-encoding functions Ff-(-), and n repair-decoding functions 
Ff(-,..., •), where 

• f[ J : 7/v, x 7 jv 2 x ... x I^ d —> Ipc d for i E I n , each of which maps the messages (Mi, M 2 ,..., M^) E 
iNt x In 2 X ... x 7jv d to one piece of coded information to be stored on node i; 

• /f ■ x 7 K d x ... x Ix d —> Inx x In 2 x ... x 7 tV| A! for A C I n and |A|= 1, 2,..., d, each of which 
maps the coded information stored on a set A of nodes to the first |A| messages (Mi, M 2 ,..., Mmi); 

• Ffj : Ix d —> Ik for j E I n and i E I n \ {j}, each of which maps a piece of coded information at 
node i to an index that is made available to regenerate the coded data stored at node j; and 

• F^ : Ik x Ir x ... x Ir lR d for j E I n , each of which maps d such indices from the helper nodes 
I n \ {j } to regenerate the information stored at the failed node j. 

The functions must satisfy: 

1) the data-reconstruction conditions 

fA (if (Mi,M 2 ,...,M d ),i E A) = (Mi, M 2 ,..., M\ A \), 

(Mi, M 2 ,..., Md) E 7/v, x In 2 x x In a , 

A C I n and |A|= 1,2,... ,d (1) 

1 For readers familiar with [ 4 ]^, the messages here correspond to the independent sources in [4|[5]. Our problem can be 
alternatively defined using such sources at the expense of more sophisticated notations. 
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2) and the node-regeneration conditions 


F f ( F 0 (/f (Mi, M 2 ,..., M d )) ,i€l n \ {j}) = ff (Mi, M 2 , M d ), 
(Mi, M 2 , M d ) E /jV, x /at 2 x ... x Ijv d and j E / n . 


(2) 


Note that a = log K d is the storage node capacity, f3 = log K is the per-helper-node repair bandwidth, 
and Bi = log JVj is the rate of the i-tii message. The base of log(-) is arbitrary, and we choose base 2 for 
convenience. As a concrete example, consider the case with n = 3 nodes. Each of the three nodes has a 
storage capacity a. There are two messages M\ and M 2 , the first of which needs to be reconstructed by 
accessing any one node, and the latter of which needs to be reconstructed by accessing any two nodes. 
Any single node failure needs to be repairable by using the remaining two nodes, each of which contributes 
f3 amount of helper data. Because of the linear-scaling relation among them, we can alternatively consider 
the normalized version of a, j3 and Bi as follows. 


Definition 2. A normalized storage-repair-bandwidth-message-rate tuple (a, (3, B\, £> 2 ,..., B d ) is said to 
be achievable with n nodes where YPj=i F j = C if there exists an {N\, IV 2 ,..., N d , K dl I \) MLD-R code 
such that 


a > 


log Kd 
Etiiog N- 


P > —^- and Bj 

“Etilog Nt 


logAj 

Etilog^’ 


j = 1,2,..., d. 


The closure of all achievable (a, j3, B\, 1? 2 ,..., B d ) tuples is the achievable normalized storage-repair- 
bandwidth-message-rate tradeoff region TZ n . For a fixed (B\, i? 2 ,..., B d ) tuple, the achievable normalized 
storage-repair-bandwidth tradeoff region is the collection of all (a, f3) pairs such that (a, j3, B i, B 2 , ..., B d ) E 
1Z n , which is denoted as TZ n (Bi, B 2 ,..., B d ). 


The codes and the tradeoff regions do not involve any particular assumption on the distribution of 
the messages. However, without loss of generality we may assume that the messages Mi, M 2 ,..., M d 
are mutually independent and uniformly distributed, since otherwise we can perform a pre-coding to 
eliminate any dependency and non-uniformity. Sometimes it is convenient to use the accumulative sum 
rates instead of the individual rates, and we thus define 


k 

= k = 1,2,..., d. (3) 

i=l 

Note that this definition implies B^f = 1, even though we often still write B J" for convenience. 

The data-reconstruction condition ([I]) requires that there is no decoding error, i.e., the zero-error 
requirement is adopted. An alternative definition is to require, instead, the probability of decoding error 
to vanish in the limit as Ilf =1 Ni —> oo. It will become clear that this does not cause any essential 
difference, and we thus do not further discuss this alternative definition in this paper. 

When deriving outer bounds, we use Si^j to denote the random variable representing the helper data 
sent from node i to node j during the repair of node j, i.e., the output of the function fp- , and W % to 
denote the random variable representing the coded data stored on node i, i.e., the output of the function 
fp. The random vector (X], X 2 ,..., X m ) is sometimes written as X™ for notational simplicity. 


2.2 Separate Coding 

One straightforward coding strategy is to encode each individual message separately using a regenerating 
code of the necessary parameters. More precisely, suppose that each message Mj. is encoded using an 
( n,k,d = n — 1) regenerating code (i.e., any k nodes can recover the message M^, and any new node 
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obtains data from any d nodes for repair) of rate (a^, /3k). Then, the resulting code has storage and repair 
rates given by 


d d 

a = Y2 a k and /3 = y~]/3 fc 

k= 1 k =1 


respectively. Therefore, if we assume that the normalized rate pair (a^, j3k) is achievable by an (n, k , d = 
n — 1) regenerating code, the normalized rate pair 


(a,/3) = ( y, akB k , ^ /3kB k 


(4) 


\k =1 


k =1 


is achievable by separate encoding. The collection of all normalized rate pairs Q, over all achievable 
normalized rate pairs (dfc, j3k) for any individual (n, k, d = n— 1) regenerating code, is the separate-coding 
normalized tradeoff region and is denoted as TZ n {B \, B 2 ,, Bd). 

In order to characterize the separate-coding normalized tradeoff region 7Z n (Bi , B 2 ,..., -B^), normalized 
tradeoff region characterizations of individual regenerating codes are needed. For example, for the case 
of n = 3, normalized tradeoff region characterizations for (3,1,2) and (3,2,2) regenerating codes are 
needed. However, such characterizations for general parameters are still unknown, except for the case of 
k = 1,2 (for an arbitrary d and n), and the special case (n,k,d) = (4,3,3) recently established in 13 


Fortunately, using these existing results, we can provide precise characterizations of the separate-coding 
normalized tradeoff regions for MLD-R when n = 3,4. 

Lemma 1 . The separate-coding normalized tradeoff region TZs(B 1 , B 2 ) is the set of (d, /3) pairs satisfying 
the following conditions: 


~ H 2 _ 3-Bi - 

B\-\ —— > a + (3 > ——b B 2 , 


and 


b> —+ —. 

2 3 


(5) 


Lemma 2. The separate-coding normalized tradeoff region 7Z,i(Bi, B 2 , B 3 ) is the set of (a, (3) pairs sat¬ 
isfying the following conditions: 


B 2 B$ 
a>Bi + — + —, 

( 6 ) 

2 a + P> 1 ^- + ^ + B 3l 

(7) 

4a T 6/3 > 6 -B 1 + 2 

(8) 

d + 2 P > ^ + B 2 + 

3 0 

(9) 

- B\ B 2 B 3 
and f3> — +-f 1 + — ■ 

3 5b 

(10) 


The proofs of these lemmas are given in the Appendix. 


3 Main Results 

Our first main result is a precise characterization of the extreme point in the tradeoff rate region 
1Z n /B\, B 2 ,..., Bf) where a is minimized. 
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Theorem 1. For any (a, f3) E lZ n {B\, B 2 ,..., Bf), we have 


(n 


n —1 / 

2)a + ^>j:^ 

fc=l 


2 )(n-fc) + l - 

/.•(■» — /.') 


Moreover, the minimum storage point of ( a, (3) E lZ n (B\, B 2 , ■ ■■,B l f) fi'it’en as 


(a,/3) 


n-i ^ n-1 ^ \ 

£>k sr^ 1 


( 11 ) 


( 12 ) 


which can he achieved by separately coding of each message Mf. with an (n, A;, d = n— 1) exact-repair MSR 
code. 

The outer bound © is proved in the Appendix. To see the second part of the theorem, recall that 
for MLD coding without the repair consideration, the minimum normalized storage rate is given in |4] as 


n.-l ^ 

E -Dfc 

T' 

k=1 


(13) 


Note that this minimum normalized storage rate is also achievable with the additional repair considera¬ 
tion, as the entire collection of the messages (Mi, M 2 ,..., Mf) can be recovered by downloading the data 
stored at any d storage nodes. Plugging the minimum normalized storage rate (13) into © gives: 


n —1 




Bi 


—:( k(n — k) 


(14) 


On the other hand, for each (n,k,d = n — 1) exact-repair MSR code, the following MSR point is known 
to be achievable 


10 


(c^fci fdk) 




(15) 


Thus the minimum storage point (12) can be achieved by the aforementioned separate-coding strategy, 
when we let each individual exact-repair regenerating code operate at their respective MSR points. 

Our next two results provide complete characterizations for TZ 3 (B\, B 2 ) and 1 Z4 (Bi, B 2 , B 3 ). 


Theorem 2. 1Z 3 (Bi, B 2 ) = TZ 3 (Bi, B 2 ). 


We obviously have 7Z 3 (B\, B 2 ) C 1Z 3 (Bi, B 2 ), and the reversed inclusion 1Z 3 {B\, B 2 ) C 1Z 3 {B\, B 2 ) is 
proved in the Appendix. The theorem states that for the case of n = 3, the separate-coding strategy is 
optimal, and there is no need to mix the messages. However, our next result shows that this is in general 
not the case and mixing the messages can be (strictly) beneficial. 


Theorem 3. The normalized storage-repair-bandwidth tradeoff region 1 Z±(Bi, B 2 , B 3 ) is the collection of 
all (a, ft) pairs that satisfy the following constraints: 


a> Bi + -B 2 + -B 3 , 

(16) 

- 7 - 5 - 

(17) 

2 a + (3 > -B\ + -B 2 + B 3 , 

- 4 - 3 - 5 - 

(18) 

« + P > g Bi + -B 2 + -B 3 , 
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Four-node MLD-R codes 



Figure 2: The normalized separate coding rate region TZ 4 (0, |) and the normalized tradeoff rate region 

(®> I’D' 

2d + 3/3 > 3Bi + -i ?2 + --B 3 , (19) 

a + 2p>^-B 1 +B 2 + \b z , (20) 

3 0 

and /3 > \b 1 + \b 2 + \b$. (21) 

dob 

The proof of the theorem is provided in Section J 5 J The regions 77,4 (0, |, |) and IZ 4 (0, |) are 

depicted in Fig. [ 2 J It can be seen that the inclusion 7Z± (0, |) C IZ 4 (0, |) is strict, thus in general 

mixing of contents in MLD-R can be beneficial. We formally state this fact next. 

Corollary 1. TZa(Bi, B 2 , B 3 ) C IZ^Bi, B 2 , B 3 ) if and only if B 2 Bz > 0. 

Since the general forms of TZ^Bi, B 2l B 3 ) and IZ^Bi, B 2l B 3 ) are different, it is expected that they 
are not identical except for certain degenerate cases. These degenerate cases turn out to be precisely 
when either B 2 = 0 or B 3 = 0. This corollary is proved in the Appendix. 


4 The Minimum Storage Point: Proof of Theorem [T] 


It can be shown that we only need to consider symmetric codes, where permutations of node indices 
do not change the induced joint entropy values. See 13 for more details about this type of symmetry. 


Thus, without loss of generality we may restrict the proof to symmetric codes only. Before presenting the 
proof of Theorem [lj we shall first present an auxiliary lemma, which will play an important role in the 
induction proof of Theorem [lj The proof of the lemma makes use of the celebrated Han’s inequality 
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partly motivated by the converse proof of (symmetrical) MLD coding problem (without the regeneration 
requirement) given in |4|. 
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Lemma 3. For any integer l such that 1 < l < d — 1 and any symmetric MLD-R code with a total of n 
nodes, we have 


+ n H(S i+1 .+1 ,..., S„^i, W[\M [) 

> B e+1 + H{S £+2 _>i,..., S n _>i, W/ + 1 |Mf +1 ). 


( 22 ) 


Proof. Let t be an integer such that 1 < i < d — 1. We start by writing 

|Mf) + H(S t+1 ^ u ■ • •, S n ^, W(\M{) 

= (w| +1 |Mf) + H(S £+1 ^, ..., S n _n, Wi\M{) 

= H(S t+1 ^ u W* +1 \M[) + H(S e+1 ..., S n _>i, Wf |Mf) 

( 6 ) 

> H(S l+ 2^1,..., S n _>i, W/ +1 |Mf) + J H r (S> +1 _ Kl , W||Mf) 

= ^(5' <+2 _ 1 ,..., S n ^i, wf +1 , M £+1 \Mi) + H(S e+1 W||Mf) 

= + ff(S< +2 _n,..., 5^!, W? +1 |M* +1 ) + tf(S m _n, Wi\M() (23) 


where we write, from now on, (s) to denote “the reason of symmetry”. Here, (a) is due to the fact that 
is a function of W £+ \, ( b ) follows from the standard submodularity of the entropy function, (c) is 
due to the fact that M £+ \ can be recovered from W^ 1 , and (d) follows from the assumption that M £+ \ 
is independent of Mi ,..., M £ and uniform over In (+1 ■ Further notice that 

(n - £)H(S l+ Wi\M[) = (n - ^)^(W||Aff) + (n - *)ff(#+i_>i|Mf, W|) 

n 

®{n-e)H{Wi\M{)+ Y, H(S^i\M{,Wi) 

i=i+1 

>{n- l)H{W%\Mi) + (5 £+ i_>i, ■ • •, S n ^i\Mi, W|) 

= (n - £)fT(W||Mf) + (S m _>i, ..., S n ^ i, Wi|Aff, W|) 

= (n-l- l)H(Wi\M[) + H(S £+ i^i, ...,S n _>i, Wf|Mf) 

> (n ~ * ~ 1 } H(Wl\M() + H (S m _n, ■ ■ ■, Sn-n, TH/|Mf) (24) 

where in the last step we applied the conditional version of Han’s inequality |14] under the symmetry 
assumption: 




(25) 


Putting (23) and ( |24| ) together gives 

H(Wi\M[) + H(S t+ i^i,..., S n ^ i, Wf|Mf) 

> S /+1 + fr(5 <+2 _,i,..., S n ^ i, W/ +1 |Mf +1 ) + 
ff(5/+i_,i , . . . , S , n _>l, Wf|Aff) 


(n-£-l)(^-l) 
i(n - t) 


H(W{\M[) 


+ 


n — t 


(26) 


which has common terms involving H(W(\Mf) and H(S £+ i_»i,..., S n -> i, W(\M() on both sides that can 
be eliminated, and this leads to exactly the inequality stated in the lemma. □ 









In the proof of the lemma above, after several steps of derivation, the same terms in the original 
quantity reappear, albeit with different coefficients. These terms are subtracted on both sides of the 
inequality, which can be conceptually viewed as recursively applying the same chains of inequalities. We 
are now ready to present the proof of Theorem [l} 

Proof of Theorem [7} The theorem is proved through an induction, where we show that for m = 1,2 ,,d, 


(n — 2 )a + j3 > 


E 

fc=i 


(n — 2)(n — k) + 1 


+ 


k(n — k) 

H(S , 


Bk + 


n — 2 


m 


m — 1 
m(n — m ) 


H(W?\M? 


ra+l—>1? • • • 5 ^n—>1 




n — m 


(27) 


The theorem is then simply a consequence of this statement when setting m = d = n — 1, normalizing 
both sides by Ylk=l Bk, and taking into account of the facts that 


for m = n — 1 and 


n — 2 
m 


m — 1 
m(n — m) 


= 0 


(28) 


H(S m+ i^i,..., S n _>!, WT|MT) > 0. 


(29) 


To show that (27) is true for m = 1, we write the following chain of inequalities: 

(n - 2 )a + (3 > (n - 2)H{W l ) + H(S 2 ^i) 

(n - l)H(S 2 ^i) 


= {n- 2)H{Wx) + 
>{n- 2)H{W l ) + 


n — 1 

H(S 2 ^, S 3 ->1, • ■ • i Sn-s-l) 


n — 1 


= (n - 2)^ + (n - 2)fl-(Wi|Mi) + 


H(S 2 ^l, S3^l, . . ■ , ffn-frl, Wj) 

n — 1 


= (n - 2)B l + (n - 2)77(Wi|Mi) + + ^(^ 1 ,^ 1 , ■ ■ ■, ^ 1 , 


n — 1 


n — 1 


(n - 2>(„ - 1) + l ft + _ 2)ff(M , i|Mi) + . (30) 


n — 1 


n — 1 


Thus (27) is true for m = 1. 

Next suppose that (27) is true for rn = mo for some mo < d, — 1, and we wish to show that it is also 
true for m = mo + 1. Notice that by Lemma [3] with £ = mo, we have 


n — 1 


mo(n — mo)(n — mo — 1) 

> - ~B m 0+1 + 


H{W™°\M™ Q ) + 

1 


-1 - "M IX ■ 

n — mo — 1 n — mo — 1 

It thus follows from the induction assumption that 


g(5 mo+ i_i,..., s ra _n, wriMp) 

rz — mo 

H(S mo+2 ^ 1, ..., wr +1 |Mr +1 ). (31) 


fc=l 


fc(n — k) 


n — 2 


mo - 1 


mo mo(n — mo) 




m 0 \ 


+ 


H(s mo+ i^i,..., s^i.THMr) 


n — mo 
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> (n-2)(n-fc) + l ^ + 

fc=i 


+ 

m 0 

£ 

fc=i 

+ 


k(n — k) 

1 


n — 2 

7B 0 


m 0 - 1 


n — 1 


n — mo — 1 


^mo+l T 


mo(n — mo) m-o(n — mo)(n — mo — 1) 

rm 0 +li ii-mo+l 




n — mo 


, s n ^ i, ^r o+i iMr o+ ) 


(n — 2)(n — fe) + 1 
fc(n — &;) 

1 


Bk + 


n — 2 
m 0 


1 




-B. 


m 0 +1 


n — mo — 1 

W g (n-2)(w-A;) + l ^ + 

k=1 


n — mo — 1 

+---r^(5 mo+ 2 ^i,..., S n _>i, W 1 mo+ 1 |M 1 mo+1 ) 

n — mn — 1 


+ 


/c(n — k) 

1 


m 0 
n — 2 
m 0 


1 


n — mo — 1 


m 0 


H(W™ 0 + 1 \M™ Q ) 


m 0 + 1 


n — mo — 1 


-/i 


mo+l 


(_c_) (n -2)(n - fc) + 1 ^ + 


fc=i 

+ 


A:(n — fc) 

1 


+ —-— 7H{s mo+ 2 ^ 1 ,..., s n ^, wr + 1 \Mr +1 ) 

n — mn — 1 

B mo+l + H{W™ 0 +l \M™ Q+l ) 


mo 
n — 2 


m 0 


_m 0 + 1 (n - m 0 - l)(m 0 + 1) 


n — mo — 1 


Bmo+l T 


1 H(Sm 0 + 2 ^i,..., s^i, wr + 1 \Mr +1 ) 


n — m o 


|A (n-2)(n-fc) + l ^ + 

k=1 


+ 


k(n — k) 
n — 2 


n — 2 


m 0 


+ 


m 0 


mo + 1 (n — mo — l)(mo + 1) n — mo — 1 

H(W™ 0 + 1 \M™ 0+1 ) 


B 


m 0 +1 


_m 0 + 1 (n — m 0 - l)(m 0 + 1 )_ 

+- 1 - -H(S mo+2 ^i ,..., S„^i, W™ 0+1 \M™ Q+l ) 

n — mo — 1 


which is precisely (|27|) for m = mo + 1 by noting that 

m 0 


n — 2 


+ 


mo + 1 (n — mo — l)(mo + 1) n — mo — 1 


(n — 2)(?r — mo — 1) + 1 
(n - m 0 - l)(m 0 + 1) 


(32) 


(33) 


Here, (a) follows from (31), and (6) follows, once again, from the conditional version of Han’s inequality 
under the symmetry assumption: 


— H(W^°\M^°) > — 1 — H(W™ 0 + 1 \M™°) 

mo mo + 1 


(34) 


and the fact that 


n — 2 


> 0, Vm-o = 1,2 ,..., d — 1 


mo n — mo — 1 
and (c) follows from the simple fact that 

H(W™ 0 + 1 \M™°) = H(W ™ 0+1 ,M mo+l \M™°) 

= H{M mo+ 1 \M[ no ) + H (W ™ 0+1 \M™ 0+1 ) 


= B mo+ 1 + H(W™ 0 + 1 \M™ 0+1 ) 


This completes the proof of Theorem [TJ 


□ 


Readers familiar with the converse proof for the (symmetrical) MLD coding problem may recognize 
certain similarities between the above proof and that in |4], particularly the use of Han’s inequality. The 
key difference is that in the converse proof in jlj, one only needs to “peel” off the message rates by 
combining information in W^s sequentially. Here, however, the regeneration requirement necessitates a 
much more elaborate peeling process. 
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A code for n = 4 where (a, (5) = 

(4,2) and (B 1 ,B 2 ,B 3 ) 


symbol 1 

symbol 2 

symbol 3 

symbol 4 

node 1 

zi 

Z5 + yi 

2/2 

2/3 

node 2 

Z2 

Z6 + 2/4 

2/1 

2/5 

node 3 

Z3 

Z7 T 2/2 

2/6 + 210 

2/4 

node 4 

24 

^8 + 2/3 

2/5 + 29 

2/6 


5 The Tradeoff Rate Region 7Z±(Bi, Bo ., R 3 ): Proof of Theorem [ 3 ] 

In this section the proof of Theorem [ 3 ] is presented. We start by providing a new code construction that 
achieves a particular normalized rate point, which will play a crucial role in the proof of Theorem [ 3 j 

5.1 A New Code 

We first prove the following proposition. 

Proposition 1. The normalized rate pair (|, |) £ lZ,i (0, |). 

Proof. We give a novel code construction where B\ = 0, B 2 = 3, B 3 = 6 , a = 4, and (5 = 2. For 
concreteness, the code symbols and algebraic operations are assumed in GF(2 4 ). Let us denote the 
information symbols of message M 2 as (xi,X 2 ,x 3 ), and the symbols of message M 3 as ( 2 / 1 , 2/25 - - -, 2 / 6 ) - 
Encoding: First use a (10,3) MDS erasure code (e.g., Reed-Solomon code) to encode (xi,X 2 ,x 3 ) 
into ten coded symbols ( 21 , Z2, ..., 210 ), such that any three symbols can completely recover (x 3 , X2, x 3 ). 
Then place linear combinations of ( 2 / 1 , 2 / 2 » --- 5 2 / 6 ) and (zi, Z 2 , ■■■, £ 10 ) into the nodes as in Table [TJ where 
the addition + is also in GF(2 4 ). 

Decoding M2 using any two nodes: To decode M2, observe that any pair of nodes has two 
symbols involving the same yj, in the form of Zi + y } in one node and y 3 in the other node. For example 
node 2 has symbol z§ + 2/4 and node 3 has 2/4. This implies Zi can be recovered, and together with the 
first symbols stored in this pair of nodes, we have three distinct symbols in the set {z\, Z2, ^10}• Thus 

by the property of the MDS code, these three symbols can be used to recover (x 3 ,X2,x 3 ) and thus the 
message M2. 

Decoding M2 and M3 using any three nodes: Recall using any two nodes we can recover 
the message M2, and thus all the code symbols (zi,Z2, ■■■,z io). This implies that when three nodes are 
available, we can eliminate all the 2* symbols first in the linear combinations. However, it is clear that 
after this step all the symbols (2/1,2/2, 2/6) are directly available, and thus the message M3 can be 

decoded. 

Repair using any three nodes: To regenerate the symbols in one node from the other three, each 
of the helper nodes sends the first symbol stored on the nodes as the initial step. Denote the y symbols 
on the failed node as (y l: y 3 . y/f), which may be stored in a form also involving 2-symbols. The helper 
nodes each find in the symbols stored on it the one involving yi, yj and 2/fc, respectively, and send these 
symbol combinations as the second step. The placement of the 2/-symbol guarantees that these symbols 
are stored on the three helper nodes respectively. Recall that from any three 2-symbols available in the 
initial step, the message M2 can recovered, and thus any of the ^-symbols. This implies that ( Vi,yj,yk ) 
can be recovered after eliminating the 2 symbols from the received symbol combinations in the second 
step, and thus all the symbols on the failed node can be successfully regenerated. Each helper node 
contributes exactly 2 symbols in this process. □ 

In the above code the linear combinations of 2-symbols and 2/-symbols are not necessarily in GF( 2 4 ), 
and it can even be in the base field GF( 2 ). The only constraint on the alphabet size is through requiring 
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Table 2 : Analysis of extreme points as intersections of any two inequalities. 


Intersection of 

a 

P 

remark 


( 

17 

) 


Bi 1 B 2 1 B 3 

3 ' 4 ' 3 

( 0 ) 


( 

18 

) 


Bi , B 2 7 B 3 

3 r 4 r 24 

(b) 

(16 

) 

( 

19 

) 

Bi + f + f 

B\ 1 2B 2 . 5B 2 

3 "r 9 "r 18 

(c) 


( 

20 

) 


Bi 1 B 2 1 B 3 

3 ' 4 ' 4 

(d) 


( 

21 

) 


B\ 1 B 2 1 B 3 

3 ' 5 ' 6 

(e) 


( 

18 

) 

Bi + 1 + 3 -¥ 

Bi 1 B 2 1 B 3 

3 ' 4 ' 4 

(/) 

(17 


( 

19 

) 

p | 25B 2 I 3 B 3 
uir 48 ~r 8 

Bi , 5 S 2 , B 3 

3 T 24 T 4 

(<?) 

1 

( 

20 

) 

B, + M 2 , 7 B 3 

^ 2 ^ 18 

Bi _i_ B 2 \ 2 B 3 

3 ' 4 ' 9 

(h) 


( 

21 

) 

p | 21 B 2 I 5 B 3 

~r 40 ' 12 

B\ 1 B 2 1 B 3 

3 ' 5 ' 6 

(0 


( 

19 

) 

Bi + W + ^ 

Bi , B 2 \ B 3 

3 ' 6 ' 4 

C 3 ) 

00 

t-H 

) 

( 

20 

) 

Bi + f ^ 

Bl , 5 B 3 

3 r 4 “ r 24 

(fc) 


( 

21 

) 

Bi + + n ,? 

-Si 1 ^2 1 S 3 

3 ' 5 ' 6 

(0 

(19 


( 

20 

) 

Bi + % + % 

Si 1 S 2 1 S 3 

3 ' 3 ' 6 

(m) 

1 

( 

21 

) 

Bi + f 

Si , S 2 , S 3 

3 ' 5 ' 6 

(n) 

(20 

) 

( 

21 

) 

+ f 

Si 1 S 2 1 S 3 

3 ' 5 ' 6 

( 0 ) 


the existence of an appropriate MDS erasure codes when encoding (aq, X 2 , X 3 ), and we have chosen GF(2 4 ) 
for simplicity. Readers familiar with the work [ 8 | may recognize that if message M 2 = {xi,X 2 ,x%) does 
not exist, the remaining code involving y's is a degenerate case of the repair-by-transfer code in | 8 j. 

As we shall see next, mixing can improve the storage-repair-bandwidth tradeoff over separate coding. 
However, such an improvement may come at the expense of requiring downloading more symbols than 
the separate-coding solution, when only a single message is required. Interestingly, in the above code, 
this potential drawback can be eliminated by downloading only the “non-mixing” symbols when all nodes 
are functioning. 


5.2 Forward Proof of Theorem [3] 

To prove the forward part of Theorem[3j we examine the extreme points of the region TZ/[(B\ , B 2 , B 3 ) for 
fixed (Hi, B 2 ,Bz). It should be noted that for different values of (B±, B 2 , B 3 ), the extreme points may 
be different, which causes certain complications for the discussion. The intersecting points of any two 
inequalities in (16)-(21), when taken to be equality, are listed in Table[2]for any fixed (Bi, B 2 , B 3 ). The 
first two columns specify which two inequalities induce the intersection. 


• Point (a) can be achieved by a separate-coding scheme that uses (4,1,3), (4,2,3) and (4,3,3) 
regenerating codes operating at the normalized rate pairs (di,/3i) = ( 1 , 4 ), (a 2 , P 2 ) = ( 3 , 4 ) and 
( 03 ,^ 3 ) = ( 3 , 3 ), respectively. 


By the inequality (17), points ( b ) and ( d ) are feasible only if B 3 = 0, and points (c) and (e) are 
feasible only if B 2 = B 3 = 0. In both cases, these points are reduced to point (a), which has been 
shown to be achievable. 


• Point (/) can be achieved by a separate-coding scheme that uses (4,1,3), (4,2,3) and (4,3,3) 
regenerating codes operating at at the normalized rate pairs (di,/?i) = (l, 3 ), ( 02 ,^ 2 ) = ( 3 , 3 ) and 
( 03 ,^ 3 ) = (§, 3 ), respectively. 
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By the inequality ( 18 ), point ( g ) is feasible only if B 2 = 0 ; point (h) is feasible only if B3 = 0 ; and 
point (i) is feasible only if B2 = B3 = 0 . In all three cases, these points are reduced to point (/), 
which has been shown to be achievable. 


By the inequality ( 20 ), point (j ) is feasible only if B2 < ^r. Consider a message triple (Mi, M2, M3) 
with sufficiently large message rates (B\, B2, B3), where 

1-7 


B 2 = 


b 3 , 7 e [ 0 , 11 - 


Split message M3 into two independent sub-messages M37 and M3 ^2 with rates 7-B3 and (1 — 7)53 = 
2B2, respectively. Consider encoding the messages M \, M37 and (M2, M3-2) separately. More 
specifically, encode message M\ using a ( 4 , 1 , 3 ) regenerating code operating at the normalized rate 
pair (l, |); encode message M37 using a ( 4 , 3 , 3 ) regenerating code operating at the normalized rate 
pair (|, |); and encode the messages (M2, M3-2) jointly using a code as described in Proposition [I] 
operating at the normalized rate pair (|, |). The total rates of this coding scheme are given by: 

/ r\ (u 3 7 5 3 4 (B 2 + 2 B 2 ) Bi 7B3 2(B 2 + 25 2 ) ^ 

(a, (j) = ( Bi H---1-7-, — + H--- I 


8 


9 


( 7 B2 3B3 B\ B2 B3 

= ( Si+ i# + Y'f + if + f 

Normalizing both sides by B\ + 5 2 + B3, we conclude that any normalized rate pair (j) with 

i-Tn 


B 2 = 


-B 3 , 7 E [ 0 , 1 ] 


is achievable, implying that point (j) is indeed achievable whenever 5 2 < 


By the inequality (19), point (k) is feasible only if B 3 < 25 2 . Consider a message triple (Mi, M 2 , M 3 ) 
with sufficiently large message rates (Bi, B 2 , B 3 ), where 

53 = 2 ( 1 - 7 ) 52 , 7 €[ 0 , 1 ]. 

Split message M 2 into two independent sub-messages M 2j 1 and M 2j2 with rates 7 5 2 and (1 — 7 ) 5 2 = 
^f-, respectively. Consider encoding the messages Mi, M 2) i and (M 2 j 2 ,M 3 ) separately. More 
specifically, encode message Mi using a (4,1, 3) regenerating code operating at the normalized rate 
pair (1, |); encode message M 2j 1 using a (4, 2,3) regenerating code operating at the normalized rate 
pair (^, j); and encode the messages (M 2 ) 2 ,M 3 ) jointly using a code as described in Proposition [I] 
operating at the normalized rate pair (g, |). The total rates of this coding scheme are given by: 

,, „ ( p . 7 B 2 4(f + B 3 ) B, 7 B 2 2(f+ B 3 )\ 

+ — +- - -. T + ^ +-g- J 

( B2 5B3 Bi B2 5B3 

= + f + i#-f + f + ^f 

Normalizing both sides by Bi + B 2 + B 3 , we may conclude that any normalized rate pair (k) with 

5 3 = 2 (l- 7 ) 5 2 , 7 €[0,1] 

is achievable, implying point ( k) is indeed achievable whenever B 3 < 25 2 . 


By the inequalities ( 19 ) and ( 20 ), point (l) is feasible only if B2 = B3 = 0 . In this case, point (l) is 
reduced to point (a), which has been shown to be achievable. 
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By the inequality (19), point (m) is feasible only if Bi < Consider a message triple (Mi, M 2 , M 3 ) 


with sufficiently large message rates (B\, B 2 , B 3 ), where 

B-i = 1 ~ 7 ^3 ; 7 €[0,1]. 

Split message M 3 into two independent sub-messages M 3 7 and with rates 753 and (1 — 7)53 = 
252 , respectively. Consider encoding the messages Mi, M 37 and (M 2 ,M 3 > 2 ) separately. More 
specifically, encode message Mi using a (4,1, 3) regenerating code operating at the normalized rate 
pair (l, |); encode message M 37 using a (4,3,3) regenerating code operating at the normalized rate 
pair (|, |); and encode the messages (M 2 , M 3 - 2 ) jointly using a code as described in Proposition [l] 
operating at the normalized rate pair (|, |). The total rates of this coding scheme are given by: 


(a,/3) = 


t ) 7-^3 4(52 + 252) 5i 753 2(52 + 252) 

Bl + ^r + — 9 —-t + ^ + — 9 — 

B 2 B 3 B\ B 2 B 3 

5i + — + — + — 

3 2 ’ 3 3 6 


Normalizing both sides by 5i + 52 + 53 , we may conclude that any normalized rate pair (m) with 

1 - 7 r, 


B 2 = 


-5 3 , 7 E [0,1] 


is achievable, implying point (m) is indeed achievable whenever B 2 < 


By the inequality (20), point (n) is feasible only if B 2 = 0. In this case, point (n) can be achieved 
by a separate-coding scheme that uses (4,1,3) and (4,3,3) regenerating codes operating at the 
normalized rate pairs ( 07 ,^ 1 ) = ( 1 , |) and ( 03 ,^ 3 ) = (|, respectively. 

Finally, point (o) can be achieved by a separate-coding scheme that uses (4,1,3), (4, 2, 3) and (4,3,3) 
regenerating codes operating at the normalized rate pairs (07,/ 3 i) = (1, 3) , (012,^2) = (§, i) and 
(d 3 ,/3 3 ) = ( 5 , g), respectively. 


The proof is now complete. 


□ 


5.3 Converse Proof of Theorem [3] 

To establish the converse of Theorem[3j we shall prove that every rate pair (a, (3) E IZ^Bi, B 2 , B 3 ) must 
satisfy the inequalities (16) (21). The inequality (16) holds even without the regeneration requirement |4j, 


and the inequality (17) follows directly from Theorem [l] by setting n = 4. It remains to show that the 
inequalities (18)—(21) are true, and we shall prove each as a separate proposition. Instead of writing 


the proofs in the conventional fashion as chains of inequalities, we utilize the computational approach 
developed in |13| and prove these inequalities by tabulation. Our proof of each inequality is given as two 
tables: The first one lists the joint entropy terms in the proof, and the second one lists the coefficients 
of needed inequalities. The last row, as the summation of all the other rows, is exactly the sought-after 
inequality. Note that each row, except for the last one, in the second table is a “simple” Shannon-type 
inequality, possibly after a permutation of the indices for each entropy term. For example, the third line 
of Table [4] is 


2 H(S 4 ^ 3 ) + 25(5^2, ^ 2 ) - 2H(S^ 1 ,S 3 ^i, S 2 -> 1 ) > 0 (35) 

which is equivalent to the simple independence bound on entropy: 

H(S 2 ^ 1 ) + H(S^ U Ss-n) - H(S^i, S 3 ^i, S 2 ^i) > 0 (36) 
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Table 3: Terms needed to prove Proposition [2] 


T\ 

H(S 4 ^ 3 ) 

t 2 

H (S * 4 — < r 2 ; S 3 ^ 2 ) 

t 3 

H(S 4 -^ 1 , S 3 .+i, S' 2 ^ 1 ) 

t 4 

H{W 4 ) 

t 5 

H(S 3 -+ 2 ,W4) 

T 6 

5(5 3 ^4,5 2 ^ 4 ,kF 4 ) 

T 7 

H(W 4 l W 3 ) 

t 8 

5(5 2 ^ 4 ,W4,kF 3 ) 

t 9 

H(S 3 -+i, 5 2 _j.i, W 4 , Wi) 

Tio 

5(5 4 ^3, Mi) 

Tn 

5(5 4 _5. 2 ,5 3 _5. 2 , Mi) 

Ti 2 

H{W4,M 1 ,M 2 ) 

Ti 3 

H(S 3 ^W4,M 1 ,M 2 ) 

T 14 

H(S 3 ^4,W4,M 1 ,M 2 ) 

T 15 

H{S 3 ^S 2 ^4,W4,M 1 ,M 2 ) 

Tie 

5(Mi) = 5+ 

T 17 

H(M \, M 2 ) = 5+ 

Tig 

5(Mi,M 2 ,M 3 ) = 5 3 + 


after taking into account of the symmetry H(S 4 ^ 3 ) = 5(S 2 _>.i) and 5( < S , 4_ s . 2 , S 3 _>. 2 ) = H(S 4 -^ 1 , S 3 ^\) in 
the assumed solution set. Further note that for n = 4 we have 5 : |" = 1, however we still write it as 
to make explicit its meaning of sum rate. 

Proposition 2. For any (d,/3) £ 7Ti(5i, 5 2 , B 3 ), we have 


24 (a + /3) > 14 5+ + 35+ + 155+ = 24 


4 5i + 3 5 2 + -5 3 
3 4 8 


(37) 


Proof. See Tables [3] and [4j 


□ 


Proposition 3. For any (a, (3) £ 7 ^ 4 (5i, 5 2 , 5 3 ), we have 


6(2 a + 3/3) > 85+ + 5+ + 95+ = 6 


35i + -5 2 + -5 3 


Proof. See Tables [5] and [6} 


(38) 

□ 


Proposition 4. For any (a, {3) £ 5.4(5i, 5 2 , 5 3 ), we have 


12(a + 2/3) > 85+ + 25+ + 105+ = 12 




Proof. See Tables [7] and [8j 


(39) 

□ 


Proposition 5. For any (a, (3) £ IZ^Bi, 5 2 , B 3 ), we have 


30/3 > 45^ + B^ + 55^ — 30 ( —B\ + —5 2 + —5 3 

' J 5 0 


Proof. See Tables [9] and [10 


(40) 

□ 
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Table 4: Proof by Tabulation of Proposition^ with terms defined in Table [3} 


Ti 

T-2 

T-, 

tv 

n 

t 6 

TV P 8 

n 

Tio 

Tii 

Tl 2 

Tl3 

Tl4 Ti5 

T~16 Ti7 Tig 

16 

-8 


4 

-4 




4 





-4 

2 

2 

-2 

12 



-6 







-6 





2 

2 

-2 



-2 





6 

6 




-6 












4 


4 

-4 


-4 









4 




-4 


4 




-4 











3 


3 

1 

CO 

1 

CO 







3 




-3 

3 


-3 







3 

3 




-3 


-3 







3 






1 

CO 

CO 

-3 







3 

3 





-3 

-3 



2 


2 



-2 


-2 





24 



24 










-14 -3 -15 


Table 5: Terms needed to prove Proposition [3j 


T\ 

TT(S 4 ^ 3 ) 

t 2 

5 , 3-s>2) 

TV 

TT(5' 4 ^i, 53 - 5 . 1 , 52-j.i) 

TV 

H(W a ) 

t 5 

H(S 3 ^ 2 ,W a ) 

n 

H(S 3 ^ 4 ,W 4 ) 

T 7 

H (5 3 _;. 4 ,5 2 _5. 4 , W 4 ) 

T ~8 

H(Wa,W 3 ) 

Tg 

H(S 2 ^W 4 iW 3 ) 

T’io 

H{S 3 ^S 2 ^W^W 1 ) 

Tn 

H{S^ 3 ,M x ) 

T 12 


Tl3 

^(5 4 ^ 3 ,5 3 ^4,Afi) 

T u 

TT(5 4 _>,3,5 3 _>, 4 ,5 2 —> 4 , Af 4 ) 

TVs 

H{W 4 ,M 1 ,M 2 ) 

Tie 

TT(5 3 ^2,W 4 ,M 1 ,M 2 ) 

Pl7 

H(S 3 ^. 4 , W 4 , M\,M 2 ) 

TVs 

H(S 3 ^2,S 2 ^4,W4,M 1 ,M 2 ) 

Tig 

H(M\) = B+ 

T 20 

H{M 1 ,M 2 ) = B+ 

T 21 

H{M\. M 2 . M 3 ) = B+ 
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Table 6: Proof by Tabulation of Proposition [3| with terms defined in Table [5j 


T\ 

t 2 

t 3 

t 4 

n 

n 

t 7 

t 8 

t 9 

Tio 

Tn 

Tl2 

Tis 

T14 

T15 Tig 

00 

Tl9 

T20 T 2 i 

12 

-6 


4 

-4 






4 






-4 


4 

4 

-4 

4 






-4 


4 





-4 





2 

2 

2 

2 

-2 

-2 


-2 

-2 







2 

2 

2 

2 

2 

-2 

-2 



-2 

-2 

-2 

2 















2 



-2 

2 




-2 



2 







2 




-2 




-2 















1 

1 


-1 -1 








1 







-1 1 



-1 








1 


1 





-1 



-1 









1 







-1 1 


-1 









1 

1 






-1 


-1 

18 



12 













-8 

-1 -9 


Table 7: Terms needed to prove Proposition [4j 


Ti 


t 2 

H(S 4 ^ 2 , S 3 ^ 2 ) 

t 3 

77(5' 4 ^i, 53_5>i, 5*2— > 1 ) 

t 4 

77(W 4 ) 

t 5 

77(5 3 ^4, W 4 ) 

Tg 

77(53^2,^4) 

T 7 

77 (5 3 _^ 4 , 5 2 -s. 4 , W 4 ) 

t 8 

77(W 4 , W 3 ) 

t 9 

77(52^4, W 4 , W 3 ) 

T 10 

ff(S 3 ^ 1 ,S 2 ^ 1 ,W 4 ,W 1 ) 

Tn 

H(S 3 ^4,S 3 ^i, 5 2 ^i, W 4 , Wi) 

T 12 

77(5 4 ^ 3 ,Mi) 

T 13 

77(5^3, Mi, M 2 ) 

t 14 

T7(5 4 ^2,5 3 ^4,A7i,A7 2 ) 

T 15 

77(5 4 ^2,5 3 ^2,A7i,A7 2 ) 

Tig 

h(s 3 ^ 2 , w 4 , m 4 ,m 2 ) 

T 17 

H(S 3 ^ 4j W 4 ,M 2 ,M 2 ) 

Ti 8 

H(S 3 -> 2 ,S 2 -H,W 4 ,M 1 ,M 2 ) 

T 19 

H(S 3 ^ 2 ,S 2 ^3,W 4 ,M 1 ,M 2 ) 

T 20 

H(S 3 ^ 2 , S 2 ^ 4 , Si_>4, w 4 , Ml, M 2 ) 

T 21 

77(Mi) = S+ 

T 22 

77 (Mi, M 2 ) = B+ 

T 23 

77(M,, M 2 . M 3 ) = B+ 
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Table 8 : 

Proof by Tabulation of Proposition [4l with terms defined 

in Table [7j 


T\ 

t 2 

t 3 t 4 t 5 

r p nr r T 1 r T i r r nr 

J- 6 J- 7 J- 8 -L 9 -*10 -*11 

rj~i rj~\ rrt rj~\ rj~\ rj~) r~t~\ 

4 12 413 4 14 4 15 4 16 4 17 1 18 

Tig T 20 T 21 

T 22 T 23 

16 

-8 








4 

-4 

4 

-4 


4 

4 

-4 







1 1 

-1 

-1 



4 

4 


-4 






4 

-4 

4 

-4 




4 

4 -4 

-4 





2 

2 -2 

-2 






2 

2 


-2 -2 





-1 1 1 


-1 



1 

1 -1 

-1 







-1 

1 1 

-1 




1 1 

-1 


-1 




1 1 

-1 


-1 




1 1 


-1 

-1 




1 

-1 1 


-1 




1 

-1 1 


-1 




2 


-1 

-1 



-1 

1 1 -1 




24 


12 



-8 

-2 -10 


Table 9: Terms needed to prove Proposition [5j 


T\ 

tf(5 4 ^ 3 ) 

T'2 

S' 3 —^ 2 ) 

t 3 

5 3 _>.i, S 2 -+ 1 ) 

t 4 

H{W a ) 

t 5 

H(S 3 ^2,W a ) 

T e 

H(S 3 ^i, S 2 -+ 1 , W 4 , W\) 

T 7 

H{S^2, 5 4 ^i,5 3 ^ 2 , 5 3 ^i, W 2 , Wi) 

t 8 

H(S A -^ 3 , Mi) 

t 9 

H(S 4r + 3 ,M 1 ,M 2 ) 

T W 

H(S4^.2, S 3 -, 2 , Ml) 

Tn 

^(5 4 ^ 3 ,5 3 ^ 4 ,Mi) 

T12 

if(S 4 _>2,S 3 -*,Mi,M 2 ) 

T13 

tf(5 4 ^l,5 3 ^4,5 2 ^4, Mi, Af 2 ) 

Ti 4 

H (5 4 _>i, 53^4, 53 ^. 2 , 5 2 ^4, Mi,M 2 ) 

T15 

H{Mi) = B+ 

Tie 

H(M u M 2 ) = B+ 

T17 

H(M i ,M 2 ,M 3 ) = B+ 
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Table 10: Proof by Tabulation of Proposition [5] with terms defined in Table [9} 


T\ T 2 

t 3 

T 4 

n n 

t 7 

Ts 

t 9 

T w 

Tn 

Tl2 

Tis 

Tl4 Tis Tig 
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-1 
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2 -2 
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-2 
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-2 
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-1 



1 



-1 



1 

-1 





1 



-1 



1 

1 

-1 


-1 











1 

1 







-1 

-1 




1 







-1 

1 

-1 


2 


1 

-1 




-1 

-1 

1 


-1 




1 



1 





-1 

-1 

30 











-4 -1 

-5 


6 Concluding remarks 


We considered the problem of multilevel diversity coding with regeneration, which addresses the storage 
vs. repair-bandwidth tradeoff in distributed storage systems with heterogeneous reliability and access 
latency requirements. It was shown that for the minimum storage point on the optimal tradeoff curve, 
separate coding is sufficient, and there is no need to mix different contents. On the other hand, a complete 
characterization of the tradeoff region was provided for the case of four nodes, which reveals that mixing 
in general can strictly improve the overall tradeoff. 

Although we focused on the case d = n — 1, some of the results can be generalized to d < n — 1 
straightforwardly, by recognizing that any MLD-R sytem with d < n — 1 includes an MLD-R sub-system 
with d = n' — 1. Particularly, the optimality of separate coding at the MSR point holds for d < n — 1 as 
well. It is also worth mentioning that in a recent work [16] , separate coding was shown to be also optimal 
at the MBR point, and thus the benefit of mixing only manifests in the intermediate tradeoff regime. 

A notable feature of this work is that we further developed the computational approach in 131 to iden¬ 
tify and prove the converse theorems. As a result, the converse proof was presented as tabulation without 
being translated into conventional form of proofs that are usually seen in information theory literature. 
It is our belief that this computational approach will be able to play an even more significant role in 
future studies. To share our data with the research community, we have posted the computational results 
presented in this paper as part of the online collection of “Solutions of Computed Information Theoretic 
Limits (SCITL)” hosted at [15], which we hope in the future can serve as a data depot for information- 
theoretic limits obtained through computational approaches. We are currently working toward extending 
the results obtained so far to more general parameters. 

There are several immediate research directions to follow. First, the code construction given for n = 4 
can be generalized to other parameters in a relatively straightforward manner, and we shall address this 
issue in a forthcoming work. Second, it is important to understand in general by mixing the contents 
how much improvement can be attained over separate coding. Finally, it may also be useful to consider 


the analogous requirement in the locally repairable code setting 17-19 . 
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Appendix: Proofs of Lemma [T], Lemma [2], Theorem [2] and Corollary [T] 

Proof of Lemma [IJ It is known that the normalized tradeoff region for the (3,1, 2) regenerating codes is 
given by the set of (ai,j3i) pairs satisfying: 


ai > 1 and > 1 (41) 

and the normalized tradeoff region for the (3, 2, 2) regenerating codes is given by the set of (d 2 ,^ 2 ) pairs 
satisfying: 


2cc 2 > 1, «2 + P 2 > 1, and 3 /3 2 > 1. (42) 

Using Fourier-Motzkin elimination, it is straightforward to verify that the separate-coding normalized 
tradeoff region 63 ( 61 , 6 2 ) = {(0461 + 02 - 62 , P\B\ + P 2 B 2 )} is indeed given by Lemma[l] O' 

Proof of Lemma^ It is known that the normalized tradeoff region for the (4,1,3) regenerating codes is 
given by the set of ( 04 ,/3i) pairs satisfying: 


04 > 1 and 3/?i > 1 


(43) 


the normalized tradeoff region for the (4,2,3) regenerating codes is given by the set of ( 0 : 2 , ,$ 2 ) pairs 
satisfying | 6 j: 

2d 2 > 1, a 2 + 2^2 > 1, and 5/?2 > 1 (44) 


and the normalized tradeoff region for the (4,3,3) regenerating codes is given by the set of ( 03 , (3%) pairs 


satisfying 13 


3o s > 1, 2 o 3 + (3 3 > 1, 4a 3 + 6/3 3 > 3, and 6/3 3 > 1. 


(45) 


However, unlike for n = 3, using Fourier-Motzkin elimination to directly obtain a polyhedral descrip¬ 
tion of 64 ( 61 , 62 , 63 ) is simply too time-consuming. Instead, denoting the set of (a, (3) pairs con¬ 
strained by the inequalities as 64(61,63,63), we shall show 64(61,62,63) C 64(61,6 2 ,6 3 ) 

and 64(61,62,63) C 64(61,62,63) separately. 

To show that 64(61,62, 63) C 64(61, 6 2 , 63), we need to show that any (ci, j3) pair in 64(61,62, 63) 
must satisfy the inequalities © — (|To|) . Consider the inequality ([ 8 ) for example. For any (a, f3) £ 
64(61, 62 , 63), we have 


4 a + 6/3 = 4 (ai 6 i + o 2 6 2 + o 3 6 3 ) + 6 {j 3 \Bi + / 1 2 6 2 + / 3 3 6 3 ) 

= ( 4 oi + 6 / 3 i) 6 i + [o 2 + 3 (02 + 2^2)] 62 + ( 4 o 3 + 6^3)63 

> (4 + 2 ) 6 i + Q + 3) 62 + 36 s 
= 661 + -6 2 + 36 3 , 


where the inequality above follows directly from the inequalities from {43)—(45). The other four inequal¬ 
ities can be proved similarly; the details are omitted here. We thus conclude that 64 ( 61 , 62 , 63 ) C 

64 ( 61 , 62 , 63 ). 

To show that 64(61,62, 63) C 64(61, 6 2 , 63), first note that the characteristic cone of 64(61,62, 63) 
is given by {(d,/ 3 ) : d > 0, j3 > 0}. By the definition of 64(61,6 2 , 63), any ray of 64(61,62,63) is 
also a ray of 64(61, 6 2 , 63). To examine the extreme points of 64(61, 6 2 ,63), we can compute the 
intersections between any two inequalities (taken as equalities) from ©-@. This yields a total of ten 
points, which are the possible extreme points of 64(61,62, 63). However, some of them do not satisfy 
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all the inequalitie^J and after eliminating them, the possible extreme points of TZ^Bi, B 2 , B 3 ) are given 
by: 


and 


(- B 2 B 3 B\ B 2 B 3 

l B i + Y + T'T + ^ L + Y' 

- B 2 3 B 3 B\ B 2 B 3 

Bi + T + ^’T + T + T 


B2 B 3 B\ Bo B 3 

Bi + T + T’T + T + T 

3B2 B 3 B\ Bo b 3 

^ + ¥ + f’f + f + f 


(46) 

(47) 

(48) 

(49) 


Note from (43)— (j45j) that the extreme points of the normalized tradeoff rate regions for the (4,1,3), 
(4, 2,3) and (4, 3, 3) regenerating codes are given by: 


(oq, 3 1 ) ( 1, ^ 


(«2, P 2 ) ~ 

and (a 3 ,/3 3 ) = 


1 1 
2’ 4 
1 1 
3’ 3 


3 1 

5’ 5 
3 1 
8 ’ 4 


1 1 
2 ’ 6 


Therefore, 


point (46) can be achieved by separate coding that uses (4,1,3), (4,2,3) and (4,3,3) regenerating 
codes operating at normalized rate pairs (di,/3i) = (l, ^), ( 012 ,^ 2 ) = ( 5 , 3 ) and ( a 3 ,j3 3 ) = (|, ^), 
respectively; 


point (47) can be achieved by separate coding that uses (4,1,3), (4,2,3) and (4,3,3) regenerating 
codes operating at normalized rate pairs (di,/3i) = (l, |), (a 2 ,^ 2 ) = ( 5 , 3 ) and ( a 3 ,j3 3 ) = (|, |), 
respectively; 


point (48) can be achieved by separate coding that uses (4,1,3), (4,2,3) and (4,3,3) regenerating 
codes operating at normalized rate pairs (di,/3i) = (l, |), ( 012 , P 2 ) = ( 5 , 3 ) and ( a 3 ,j3 3 ) = (^, |), 
respectively; and 


point (49) can be achieved by separate coding that uses (4,1,3), (4,2,3) and (4,3,3) regenerating 
codes operating at normalized rate pairs (di,/3i) = (l, ^), ( 012 ,^ 2 ) = (§, 3 ) and ( a 3 ,j3 3 ) = Q, g), 
respectively. 

□ 


We thus conclude that IZ^Bi, B 2 , B 3 ) C IZ^Bi, B 2 , B 3 ), completing the proof of Lemma |2| 

Converse Proof of Theorem [1| To establish the converse of Theorem [2j we shall prove that every nor¬ 
malized rate pair (d, (3) G 1Z 3 (Bi, B 2 ) must satisfy the inequalities from ([ 5 ]). The inequality a > B\ + ^ 
holds even without the regeneration requirement 14], and the inequality d + /3 > +B 2 follows directly 


from Theorem 0by setting n = 3. It remains to prove that the inequality f3 > ^ is true, which can 

be shown as follows. 

First note that the repair bandwidth f3 can be bounded from below as follows: 

P>^[H(S 1 ^ 3 ) + H(S 2 ^ 3 )] 


2 More precisely, such a point violates certain inequalities unless certain components in (Bi, B 2 , Bp) are zeros; however, 
under these degenerate conditions, it reduces to one of the points given in (46 1 -( 491. 
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> --ff(5i->.3, S 2 —^f) 

= l 2 H{S^,S 2 ^W^M{) 

> X -B x + ^HiS^S^WalMi) 


( 50 ) 


where (a) is due to the fact that the data stored at node three W3 can be regenerated from the helper 
messages Si -^3 and S 2 ^. 3 . To proceed, we can further bound the second term on the right-hand side of 
(50) as follows: 


H(S 1 ^ 3 ,S 2 ^3,W 3 \M 1 ) 

= H(S^ 3 , S 2 ^ 3 , w 3 , S 3 ^ 1 , S 3 ->2\Mi) 

> H(Si^3, $2-*, S 3 -h, S , 3_ s> 2|Tfi) 

= ^[H(S^3, S 2 ^3, S 3 ^1, S 3 —^ 2 1 M \) + H(S^ 2 , S 3 ^ 2 , S 2 ->i, S 2 ^3 |Mi) 

+ tf(S 3 -i, S 2 ^.i, Si^. 2 , S'i^.3|Afi)] 

( b ) 1 

— ^[H(Si^. 3, S 2 ^3, S 3 ^i, S 3 ^ 2 , S’i_ s> 2, S 2 ^i\Mi) 

+ H(S3^. 2 , S 2 ^.3\Mi) + H(S 3 ^i, S 2 -1, Si^2, <S , i_ > .3|Mi)] 

> - [-02 + H(S3^ 2 , S 2 ->3\Mi) + H (*S*3—>■!, S 2 -S.I, Si_>2, S'l^-31Ml)] 

> - [B 2 + H(S3^. 2 , S 2 ^ S 3 -H, S 2 -n, 5i_2, <5 i_>.3|Mi)] 

> J[S 2 + //(M 2 |M 1 )] 

2ft 


(51) 


where (a) is due to the fact that the helper messages S3-+2 are functions of W3, ( b ) follows from the 

submodularity of entropy function, and (c) is because from (<Si_> 3 , S 2 ^. 3 , S 3 ^±, S 2 ->-i) we can regenerate 
{W\ , W3) and subsequently decode M 2 . Substituting (51) into (50) gives 


13 - \ Bi + l Bl - 


(52) 


Normalizing both sides by B\+B 2 completes the proof of /3 > ^ and hence the converse theorem. □ 


Proof of Corollary^ 7 } Let us first show that when B 2 B 3 = 0, we have IZ^Bi, B 2 , B3 ) = IZ^B 1, B 2 , B3 ). 
Since we have TZ^Bi, B 2 , B3) C IZ^Bi, B 2 , B3) apriori, we only need to show that TZj(B^ , B 2 , B3) C 
'TZ,i{Bi 1 B 2 ,B3). Further note that the inequality ([ 8 J) is the only one from the set of inequalities ([6j)-([10j) 
that is not shared by the inequalities from the set of inequalities (16)—(21), so we only need to show that 
any normalized rate pair (d,/3) £ TZ^Bi, B 2 , B3) must satisfy the inequality ([ 8 ]) when B 2 B 3 = 0. 

Note that when B 2 = 0, ([ 8 ]) follows directly from (19). On the other hand, when B3 = 0, from (pd 
and ( 20 ) we have 


4a + 6/3 — — (2a + /3) + — (a + 2/3) > 6 B± + —B 2 

OO z 


(53) 


which is (| 8 j) when B3 = 0 . This proves the “if’ part of the corollary. 

To prove the “only if’ part, we shall assume that IZ^Bi, B 2 , B3) = IZ^Bi, B 2 , B3) and B 2 / 0. Note 
that when B 2 7 ^ 0, the inequality ® does not follow directly from the inequality (19). Since these two 
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inequalities are “parallel”, so neither can be active within their respective groups of inequalities. Now 
consider the normalized rate pair 


{a, P) 



7B 3 -Bi Ih 2 B 3 \ 

T^ , T + T + _ iT)' 


It is straightforward to verify that the above point satisfies the inequalities (|6]), Q, ([9]) and (10). Since the 
inequality ([8]) must be inactive within its group, the above point must satisfy the inequality ^8]) as well, 
which immediately implies that B 3 = 0. We thus conclude that when IZ^Bi, B 2 , B 3 ) = TZ^Bi, E> 2 , B 3 ), 
we must have B 2 B 3 = 0. This completes the proof of the “only if’ part of the corollary. □ 
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